Lab 8 1. Execute the instruction "more /proc/cpuinfo" to discover things about your processor such as the clock speed. 2. Read about the Intel rdtsc (ReaD Time Stamp Counter) instruction on wikipedia.org 2. Make sure you understand how "longlong" ints are handled on our 32 bit Linux machines. Compile and run the following program and note the format specifier "%lu". cis lclient19:~/class/oct21>more onefile.c #include unsigned long long rdtsc(); void main() { unsigned long long mylong; mylong = rdtsc(); printf("Unsigned long long is %llu\n", mylong); } unsigned long long rdtsc() { return 123456789101112ull; } 3. Separate onefile.c into main.c and rdtsc.c, create rdtsc.s, and make sure you understand the listing. cis lclient01:~/2012/rdtsc>more rdtsc.s .file "rdtsc.c" .text .globl rdtsc .type rdtsc, @function rdtsc: pushl %ebp movl %esp, %ebp movl $ 2045822408, %eax movl $28744, %edx popl %ebp ret .size rdtsc, . rdtsc .ident "GCC: (GNU) 4.4.5 20101112 (Red Hat 4.4.5 2)" .section .note.GNU stack,"",@progbits cis lclient01:~/2012/rdtsc> 4. Strip rdtsc.s of the unnecessary instructions and insert the rdtsc instruction. Where does rdtsc leave the 64 bit counter? Where does a C function return a 64 bit unsigned integer? 5. Compile and execute the following main program with your new rdtsc.s function. The output should look something like the following: cis lclient01:~/2012/rdtsc>more testrdtsc.c #include unsigned long long rdtsc(); void main() { unsigned long long a, b, c, d; a = rdtsc(); b = rdtsc(); c = rdtsc(); d = rdtsc(); printf("b a = %llu\n", b a); printf("c b = %llu\n", c b); printf("d c = %llu\n", d c); } cis lclient01:~/2012/rdtsc>gcc testrdtsc.c rdtsc.s cis lclient01:~/2012/rdtsc>./a.out b a = 104 c b = 112 d c = 104 cis lclient01:~/2012/rdtsc> 6. Estimate the number of clock cycles needed to execute the rdtsc instruction. Describe in writing the way you obtained your estimate. A good answer would use several ways. Show the output from your test program. (Hints a function with 10 rdtsc instructions or a function with no rdtsc instructions?) How many clock cycles are required to call and return from a function. For the remaining questions, use the following main program. It is generally a good idea to call the code you are timing at least two or three times. In many cases, the first measurement is larger as you fill caches of various sorts (e.g. physical memory, level 2 cache, level 1 cache). Show the run times and try to explain them. #include unsigned long long rdtsc(); void test(); void main() { unsigned long long a, b, c, d; a = rdtsc(); test(); b = rdtsc(); test(); c = rdtsc(); test(); d = rdtsc(); printf("b a = %llu\n", b a); printf("c b = %llu\n", c b); printf("d c = %llu\n", d c); } 7. How long does it take to print a constant string? #include void test() { printf("Hello World\n"); } 8. How long does it take to set the elements of a 4 by 4 matrix. #define SIZE 4 int matrix[SIZE][SIZE]; void test() { int i, j; for (i = 0; i < SIZE; i++) { for (j = 0; j < SIZE; j++) matrix[i][j] = i + j; } } 9. Repeat part 8 setting SIZE equal to 16, 64, 256, and 1024. 10. Repeat part 9 using optimization level 3 (gcc o3 main.c test.c rdtsc.s). 11. Repeat parts 8, 9, and 10 using a column scan (rather and a row scan) of matrix, e.g. #define SIZE 4 int matrix[SIZE][SIZE]; void test() { int i, j; for (i = 0; i < SIZE; i++) { for (j = 0; j < SIZE; j++) matrix[j][i] = i + j; } } Note that running the code is trivial. The grading is determened by the insight shown by your answers.